SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing

نویسندگان

چکیده

For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art data analytic methods fail to make full use available computing resources. The main reason is that such must wait for bottleneck complete corresponding transmission computation in each phase. Furthermore, may be impractical bandwidth dynamicity diverse job parallelism. To this end, we propose a Simultaneous Data Transfer Processing (SDTP) mechanism accelerate analytics, with joint consideration dynamics In SDTP, site can execute computation, provided it obtains required input data. As result, loading, map, shuffle, reduce phases at need not completion previous other sites. We further improve SDTP method offering more accurate time estimation generalizing dynamic situations. trace-driven results demonstrate response 19% 72% compared methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PIXIDA: Optimizing Data Parallel Jobs in Wide-Area Data Analytics

In the era of global-scale services, big data analytical queries are often required to process datasets that span multiple data centers (DCs). In this setting, cross-DC bandwidth is often the scarcest, most volatile, and/or most expensive resource. However, current widely deployed big data analytics frameworks make no attempt to minimize the traffic traversing these links. In this paper, we pre...

متن کامل

Lube: Mitigating Bottlenecks in Wide Area Data Analytics

Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-estab...

متن کامل

Cluster-to-cluster data transfer with data compression over wide-area networks

The recent emergence of ultra high-speed networks up to 100 Gb/s has posed numerous challenges and has led to many investigations on efficient protocols to saturate 100 Gb/s links. However, end-to-end data transfers involve many components, not only protocols, affecting overall transfer performance. These components include disk I/O subsystem, additional computation associated with data streams...

متن کامل

The Wide Area Data

Sharing global remote data over large networks poses two major problems: rstly, the data must be discovered; and secondly, the data must be made accessible to the application. Our aim is to provide a single uniied interface to both local and remote data, removing location dependence and improving performance. Our solution incorporates shared memory and caching techniques. A location server prov...

متن کامل

Scalable Bulk Data Transfer in Wide Area Networks

Bulk data transfer in wide area networks (WAN) requires scalable and high network bandwidth. In this paper, we identify a number of the scalability limitations that affect the full utilization of peak theoretical network bandwidth. In addition, we study and classify different offered approaches to overcome some of the identified limitations and increase network bandwidth among Grid components i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Cloud Computing

سال: 2023

ISSN: ['2168-7161', '2372-0018']

DOI: https://doi.org/10.1109/tcc.2021.3119991